我们提出了一个端到端的讲座视频生成系统,该系统可以直接从注释的幻灯片,讲师的参考语音和讲师的参考肖像视频中生成现实和完整的讲座视频。我们的系统主要由语音合成模块组成,具有很少的扬声器适应器和基于对抗性学习的说话头生成模块。它不仅能够减少讲师的工作量,还可以改变语言和口音,这可以帮助学生更轻松地跟随讲座,并能够更广泛地传播讲座内容。我们的实验结果表明,所提出的模型在真实性,自然性和准确性方面优于其他当前方法。这是一个视频演示,展示了我们的系统的工作原理以及评估和比较的结果:https://youtu.be/cy6tyki0cog。
translated by 谷歌翻译
The rapid development of remote sensing technologies have gained significant attention due to their ability to accurately localize, classify, and segment objects from aerial images. These technologies are commonly used in unmanned aerial vehicles (UAVs) equipped with high-resolution cameras or sensors to capture data over large areas. This data is useful for various applications, such as monitoring and inspecting cities, towns, and terrains. In this paper, we presented a method for classifying and segmenting city road traffic dashed lines from aerial images using deep learning models such as U-Net and SegNet. The annotated data is used to train these models, which are then used to classify and segment the aerial image into two classes: dashed lines and non-dashed lines. However, the deep learning model may not be able to identify all dashed lines due to poor painting or occlusion by trees or shadows. To address this issue, we proposed a method to add missed lines to the segmentation output. We also extracted the x and y coordinates of each dashed line from the segmentation output, which can be used by city planners to construct a CAD file for digital visualization of the roads.
translated by 谷歌翻译
Data compression is becoming critical for storing scientific data because many scientific applications need to store large amounts of data and post process this data for scientific discovery. Unlike image and video compression algorithms that limit errors to primary data, scientists require compression techniques that accurately preserve derived quantities of interest (QoIs). This paper presents a physics-informed compression technique implemented as an end-to-end, scalable, GPU-based pipeline for data compression that addresses this requirement. Our hybrid compression technique combines machine learning techniques and standard compression methods. Specifically, we combine an autoencoder, an error-bounded lossy compressor to provide guarantees on raw data error, and a constraint satisfaction post-processing step to preserve the QoIs within a minimal error (generally less than floating point error). The effectiveness of the data compression pipeline is demonstrated by compressing nuclear fusion simulation data generated by a large-scale fusion code, XGC, which produces hundreds of terabytes of data in a single day. Our approach works within the ADIOS framework and results in compression by a factor of more than 150 while requiring only a few percent of the computational resources necessary for generating the data, making the overall approach highly effective for practical scenarios.
translated by 谷歌翻译
Though impressive success has been witnessed in computer vision, deep learning still suffers from the domain shift challenge when the target domain for testing and the source domain for training do not share an identical distribution. To address this, domain generalization approaches intend to extract domain invariant features that can lead to a more robust model. Hence, increasing the source domain diversity is a key component of domain generalization. Style augmentation takes advantage of instance-specific feature statistics containing informative style characteristics to synthetic novel domains. However, all previous works ignored the correlation between different feature channels or only limited the style augmentation through linear interpolation. In this work, we propose a novel augmentation method, called \textit{Correlated Style Uncertainty (CSU)}, to go beyond the linear interpolation of style statistic space while preserving the essential correlation information. We validate our method's effectiveness by extensive experiments on multiple cross-domain classification tasks, including widely used PACS, Office-Home, Camelyon17 datasets and the Duke-Market1501 instance retrieval task and obtained significant margin improvements over the state-of-the-art methods. The source code is available for public use.
translated by 谷歌翻译
Time-of-flight (ToF) distance measurement devices such as ultrasonics, LiDAR and radar are widely used in autonomous vehicles for environmental perception, navigation and assisted braking control. Despite their relative importance in making safer driving decisions, these devices are vulnerable to multiple attack types including spoofing, triggering and false data injection. When these attacks are successful they can compromise the security of autonomous vehicles leading to severe consequences for the driver, nearby vehicles and pedestrians. To handle these attacks and protect the measurement devices, we propose a spatial-temporal anomaly detection model \textit{STAnDS} which incorporates a residual error spatial detector, with a time-based expected change detection. This approach is evaluated using a simulated quantitative environment and the results show that \textit{STAnDS} is effective at detecting multiple attack types.
translated by 谷歌翻译
We propose a novel model agnostic data-driven reliability analysis framework for time-dependent reliability analysis. The proposed approach -- referred to as MAntRA -- combines interpretable machine learning, Bayesian statistics, and identifying stochastic dynamic equation to evaluate reliability of stochastically-excited dynamical systems for which the governing physics is \textit{apriori} unknown. A two-stage approach is adopted: in the first stage, an efficient variational Bayesian equation discovery algorithm is developed to determine the governing physics of an underlying stochastic differential equation (SDE) from measured output data. The developed algorithm is efficient and accounts for epistemic uncertainty due to limited and noisy data, and aleatoric uncertainty because of environmental effect and external excitation. In the second stage, the discovered SDE is solved using a stochastic integration scheme and the probability failure is computed. The efficacy of the proposed approach is illustrated on three numerical examples. The results obtained indicate the possible application of the proposed approach for reliability analysis of in-situ and heritage structures from on-site measurements.
translated by 谷歌翻译
Recently, automated co-design of machine learning (ML) models and accelerator architectures has attracted significant attention from both the industry and academia. However, most co-design frameworks either explore a limited search space or employ suboptimal exploration techniques for simultaneous design decision investigations of the ML model and the accelerator. Furthermore, training the ML model and simulating the accelerator performance is computationally expensive. To address these limitations, this work proposes a novel neural architecture and hardware accelerator co-design framework, called CODEBench. It is composed of two new benchmarking sub-frameworks, CNNBench and AccelBench, which explore expanded design spaces of convolutional neural networks (CNNs) and CNN accelerators. CNNBench leverages an advanced search technique, BOSHNAS, to efficiently train a neural heteroscedastic surrogate model to converge to an optimal CNN architecture by employing second-order gradients. AccelBench performs cycle-accurate simulations for a diverse set of accelerator architectures in a vast design space. With the proposed co-design method, called BOSHCODE, our best CNN-accelerator pair achieves 1.4% higher accuracy on the CIFAR-10 dataset compared to the state-of-the-art pair, while enabling 59.1% lower latency and 60.8% lower energy consumption. On the ImageNet dataset, it achieves 3.7% higher Top1 accuracy at 43.8% lower latency and 11.2% lower energy consumption. CODEBench outperforms the state-of-the-art framework, i.e., Auto-NBA, by achieving 1.5% higher accuracy and 34.7x higher throughput, while enabling 11.0x lower energy-delay product (EDP) and 4.0x lower chip area on CIFAR-10.
translated by 谷歌翻译
Point Cloud Registration is the problem of aligning the corresponding points of two 3D point clouds referring to the same object. The challenges include dealing with noise and partial match of real-world 3D scans. For non-rigid objects, there is an additional challenge of accounting for deformations in the object shape that happen to the object in between the two 3D scans. In this project, we study the problem of non-rigid point cloud registration for use cases in the Augmented/Mixed Reality domain. We focus our attention on a special class of non-rigid deformations that happen in rigid objects with parts that move relative to one another about joints, for example, robots with hands and machines with hinges. We propose an efficient and robust point-cloud registration workflow for such objects and evaluate it on real-world data collected using Microsoft Hololens 2, a leading Mixed Reality Platform.
translated by 谷歌翻译
Robots have been steadily increasing their presence in our daily lives, where they can work along with humans to provide assistance in various tasks on industry floors, in offices, and in homes. Automated assembly is one of the key applications of robots, and the next generation assembly systems could become much more efficient by creating collaborative human-robot systems. However, although collaborative robots have been around for decades, their application in truly collaborative systems has been limited. This is because a truly collaborative human-robot system needs to adjust its operation with respect to the uncertainty and imprecision in human actions, ensure safety during interaction, etc. In this paper, we present a system for human-robot collaborative assembly using learning from demonstration and pose estimation, so that the robot can adapt to the uncertainty caused by the operation of humans. Learning from demonstration is used to generate motion trajectories for the robot based on the pose estimate of different goal locations from a deep learning-based vision system. The proposed system is demonstrated using a physical 6 DoF manipulator in a collaborative human-robot assembly scenario. We show successful generalization of the system's operation to changes in the initial and final goal locations through various experiments.
translated by 谷歌翻译
We present DyFOS, an active perception method that Dynamically Finds Optimal States to minimize localization error while avoiding obstacles and occlusions. We consider the scenario where a ground target without any exteroceptive sensors must rely on an aerial observer for pose and uncertainty estimates to localize itself along an obstacle-filled path. The observer uses a downward-facing camera to estimate the target's pose and uncertainty. However, the pose uncertainty is a function of the states of the observer, target, and surrounding environment. To find an optimal state that minimizes the target's localization uncertainty, DyFOS uses a localization error prediction pipeline in an optimization search. Given the states mentioned above, the pipeline predicts the target's localization uncertainty with the help of a trained, complex state-dependent sensor measurement model (which is a probabilistic neural network in our case). Our pipeline also predicts target occlusion and obstacle collision to remove undesirable observer states. The output of the optimization search is an optimal observer state that minimizes target localization uncertainty while avoiding occlusion and collision. We evaluate the proposed method using numerical and simulated (Gazebo) experiments. Our results show that DyFOS is almost 100x faster than yet as good as brute force. Furthermore, DyFOS yielded lower localization errors than random and heuristic searches.
translated by 谷歌翻译